Skip to content

Conversation

@major
Copy link
Contributor

@major major commented Dec 15, 2025

Description

Wire up the /infer endpoint to Llama Stack for actual inference of RHEL Lightspeed requests:

  • Handle empty LLM responses with fallback message
  • Include request_id in all log statements for tracing

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude

Related Tickets & Documents

  • Related Issue RSPEED-2229

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

No changes to how tests are run.

Summary by CodeRabbit

  • New Features

    • Replaced placeholder inference with live LLM retrieval for user queries.
    • Added request/response logging and session-aware retrieval to improve reliability.
  • Bug Fixes

    • Returns HTTP 503 with a service-unavailable payload when the LLM backend is unreachable.
    • Provides a fallback response when the LLM yields no content.
    • Strips whitespace and validates blank questions before processing.
  • Tests

    • Expanded unit tests covering default model resolution, successful and empty LLM outputs, API errors, and validation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 15, 2025

Walkthrough

Added default-model resolution and a stateless retrieval flow for the /infer endpoint: new helpers fetch configured provider/model, create a temporary Llama Stack agent, send a UserMessage, extract the response (with safety/fallbacks), and map API connection/configuration failures to HTTP 503.

Changes

Cohort / File(s) Summary
Endpoint implementation
src/app/endpoints/rlsapi_v1.py
Added _get_default_model_id() to read provider/model from configuration and raise 503 on misconfiguration; added retrieve_simple_response(question: str) to build a temporary agent, send a UserMessage, and extract response text; wired APIConnectionError handling and logging; updated infer_responses to include 503 and integrated retrieval flow with fallback to constants.UNABLE_TO_PROCESS_RESPONSE.
Unit tests
tests/unit/app/endpoints/test_rlsapi_v1.py
Expanded tests with fixtures mocking configuration, LLM success/empty responses, and APIConnectionError; added tests for default-model resolution (success/failure), retrieve_simple_response (success/empty/error), /infer endpoint behaviors (minimal/full requests, unique request_id, whitespace stripping, input validation), and 503 handling.

Sequence Diagram

sequenceDiagram
    actor Client
    participant Endpoint as /infer Endpoint
    participant Config as Configuration
    participant Agent as Temp Agent (Llama Stack)
    participant LLM as LLM Service

    Client->>Endpoint: POST /infer (question)
    Endpoint->>Endpoint: Validate & strip question
    Endpoint->>Config: Resolve default model/provider
    alt Config failure
        Config-->>Endpoint: Error
        Endpoint-->>Client: HTTP 503 (ServiceUnavailable)
    else Config success
        Config-->>Endpoint: provider:model
        Endpoint->>Agent: Create temporary agent
        Endpoint->>Agent: Send UserMessage(question)
        Agent->>LLM: Forward user message
        LLM-->>Agent: Return Turn/response
        Agent-->>Endpoint: Response content
        alt Content present
            Endpoint-->>Client: HTTP 200 + LLM response
        else Empty content
            Endpoint-->>Client: HTTP 200 + UNABLE_TO_PROCESS_RESPONSE
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Check correct mapping of configuration errors and APIConnectionError to HTTP 503
  • Verify temporary agent creation/teardown and async handling in retrieve_simple_response
  • Review response extraction and fallback to constants.UNABLE_TO_PROCESS_RESPONSE
  • Inspect new unit tests for adequate coverage and realistic mocks

Possibly related PRs

Suggested labels

ok-to-test

Suggested reviewers

  • tisnik

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(rlsapi): implement LLM integration for v1 /infer endpoint' accurately and specifically describes the main change: wiring the /infer endpoint to Llama Stack for actual LLM inference.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link

openshift-ci bot commented Dec 15, 2025

Hi @major. Thanks for your PR.

I'm waiting for a github.com member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Wire up the /infer endpoint to Llama Stack for actual inference:

- Handle empty LLM responses with fallback message
- Include request_id in all log statements for tracing

Signed-off-by: Major Hayden <major@redhat.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/unit/app/endpoints/test_rlsapi_v1.py (1)

46-75: Consider using AsyncMock for async method mocks.

mock_agent.create_turn is awaited in the implementation, but the mock uses a regular Mock. While Python 3.8+ handles this, using AsyncMock is more explicit and aligns with the coding guidelines.

 def mock_llm_response_fixture(
     mocker: MockerFixture, prepare_agent_mocks: AgentFixtures
 ) -> None:
     """Mock the LLM integration for successful responses."""
     mock_client, mock_agent = prepare_agent_mocks
+    
+    # Use AsyncMock for async methods
+    mock_agent.create_turn = mocker.AsyncMock()

     # Create mock output message with content
     mock_output_message = mocker.Mock()
     mock_output_message.content = "This is a test LLM response."

     # Create mock turn response
     mock_turn = mocker.Mock()
     mock_turn.output_message = mock_output_message
     mock_turn.steps = []
     mock_agent.create_turn.return_value = mock_turn

As per coding guidelines, async methods should use AsyncMock for mocking.

src/app/endpoints/rlsapi_v1.py (1)

82-119: Consider documenting HTTPException in the Raises section.

The function can also raise HTTPException (503) via _get_default_model_id() if configuration is missing. Adding this to the docstring would provide complete documentation of possible exceptions.

     Raises:
         APIConnectionError: If the Llama Stack service is unreachable.
+        HTTPException: 503 if no model is configured.
     """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a2a30f and 564434e.

📒 Files selected for processing (2)
  • src/app/endpoints/rlsapi_v1.py (4 hunks)
  • tests/unit/app/endpoints/test_rlsapi_v1.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/unit/app/endpoints/test_rlsapi_v1.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/unit/app/endpoints/test_rlsapi_v1.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/endpoints/rlsapi_v1.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/rlsapi_v1.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/rlsapi_v1.py
🧬 Code graph analysis (2)
tests/unit/app/endpoints/test_rlsapi_v1.py (4)
src/app/endpoints/rlsapi_v1.py (3)
  • _get_default_model_id (49-79)
  • infer_endpoint (124-182)
  • retrieve_simple_response (82-119)
src/configuration.py (3)
  • configuration (73-77)
  • AppConfig (39-181)
  • inference (134-138)
src/utils/suid.py (1)
  • check_suid (19-54)
src/models/rlsapi/requests.py (1)
  • RlsapiV1InferRequest (123-200)
src/app/endpoints/rlsapi_v1.py (5)
src/client.py (2)
  • AsyncLlamaStackClientHolder (18-55)
  • get_client (49-55)
src/models/config.py (1)
  • Action (755-808)
src/models/rlsapi/requests.py (1)
  • RlsapiV1InferRequest (123-200)
src/utils/endpoints.py (1)
  • get_temp_agent (387-423)
src/utils/types.py (1)
  • content_to_str (20-40)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-pr
🔇 Additional comments (12)
tests/unit/app/endpoints/test_rlsapi_v1.py (8)

1-32: LGTM!

Imports are well-organized, and the pylint disables are appropriate for testing private functions and fixture-based side effects.


35-43: LGTM!

The configuration fixture correctly extends the minimal config and patches it at the point of use.


78-103: LGTM!

The fixture correctly simulates an empty LLM response for testing the fallback path. The same AsyncMock suggestion from the previous fixture applies here as well.


106-116: LGTM!

The fixture properly simulates APIConnectionError for testing error handling. While the real exception would originate from network operations rather than get_client(), this approach is valid for unit testing the exception handling flow.


119-163: LGTM!

Good test coverage for _get_default_model_id() with parameterized error cases. The tests properly verify both success and failure paths with appropriate assertions.


166-193: LGTM!

Comprehensive async tests for retrieve_simple_response() covering success, empty output, and error propagation scenarios.


196-279: LGTM!

Excellent test coverage for infer_endpoint() including minimal/full requests, unique request ID generation, 503 error handling, and fallback behavior for empty LLM responses.


282-295: LGTM!

Good validation tests ensuring the request model properly rejects empty/whitespace questions and strips whitespace from valid questions.

src/app/endpoints/rlsapi_v1.py (4)

1-35: LGTM!

Module setup follows coding guidelines with proper docstring, logger pattern, and organized imports.


38-46: LGTM!

The 503 response correctly documents the service unavailability scenario in the OpenAPI schema.


49-79: LGTM!

Well-structured helper function with proper error handling, descriptive docstring, and appropriate HTTP 503 responses for configuration issues.


122-182: LGTM!

Solid endpoint implementation with:

  • Proper async handling and type annotations
  • Unique request_id generation for tracing
  • Comprehensive error handling (503 for APIConnectionError)
  • Fallback for empty LLM responses
  • Good logging with request_id context throughout

Signed-off-by: Major Hayden <major@redhat.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/app/endpoints/rlsapi_v1.py (1)

49-80: Use ServiceUnavailableResponse for consistency.

The function manually constructs HTTPException with a custom detail dict, while infer_endpoint uses ServiceUnavailableResponse for similar 503 errors. This inconsistency makes error handling harder to maintain and may result in different response formats.

Apply this diff to use ServiceUnavailableResponse consistently:

 def _get_default_model_id() -> str:
     """Get the default model ID from configuration.
 
     Returns the model identifier in Llama Stack format (provider/model).
 
     Returns:
         The model identifier string.
 
     Raises:
         HTTPException: If no model can be determined from configuration.
     """
     if configuration.inference is None:
         msg = "No inference configuration available"
         logger.error(msg)
-        raise HTTPException(
-            status_code=503,
-            detail={"response": "Service configuration error", "cause": msg},
-        )
+        response = ServiceUnavailableResponse(
+            backend_name="Configuration",
+            cause=msg,
+        )
+        raise HTTPException(**response.model_dump())
 
     model_id = configuration.inference.default_model
     provider_id = configuration.inference.default_provider
 
     if model_id and provider_id:
         return f"{provider_id}/{model_id}"
 
     msg = "No default model configured for rlsapi v1 inference"
     logger.error(msg)
-    raise HTTPException(
-        status_code=503,
-        detail={"response": "Service configuration error", "cause": msg},
-    )
+    response = ServiceUnavailableResponse(
+        backend_name="Configuration",
+        cause=msg,
+    )
+    raise HTTPException(**response.model_dump())
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 564434e and f0d9dbc.

📒 Files selected for processing (2)
  • src/app/endpoints/rlsapi_v1.py (4 hunks)
  • tests/unit/app/endpoints/test_rlsapi_v1.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
tests/{unit,integration}/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage

Files:

  • tests/unit/app/endpoints/test_rlsapi_v1.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/unit/app/endpoints/test_rlsapi_v1.py
src/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

src/**/*.py: Use absolute imports for internal modules in LCS project (e.g., from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Use logger = logging.getLogger(__name__) pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) and Optional[Type] or Type | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Use async def for I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes: Configuration for config classes, Error/Exception for exceptions, Resolver for strategy patterns, Interface for abstract base classes
Abstract classes must use ABC with @abstractmethod decorators
Include complete type annotations for all class attributes in Python classes
Use import logging and module logger pattern with standard log levels: debug, info, warning, error

Files:

  • src/app/endpoints/rlsapi_v1.py
src/app/endpoints/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/rlsapi_v1.py
src/**/{client,app/endpoints/**}.py

📄 CodeRabbit inference engine (CLAUDE.md)

Handle APIConnectionError from Llama Stack in integration code

Files:

  • src/app/endpoints/rlsapi_v1.py
🧬 Code graph analysis (1)
tests/unit/app/endpoints/test_rlsapi_v1.py (1)
src/app/endpoints/rlsapi_v1.py (3)
  • _get_default_model_id (49-79)
  • infer_endpoint (125-183)
  • retrieve_simple_response (82-120)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-pr
🔇 Additional comments (8)
tests/unit/app/endpoints/test_rlsapi_v1.py (5)

3-4: LGTM: Appropriate pylint disables for unit tests.

The disables for protected-access and unused-argument are standard for unit tests that need to test private functions and use fixtures.


35-43: LGTM: Well-structured configuration fixture.

The fixture properly extends the minimal config with inference defaults and patches it into the module under test.


126-198: LGTM: Comprehensive test coverage for helper functions.

The tests cover success paths, error conditions, and edge cases for both _get_default_model_id and retrieve_simple_response. Good use of parametrization and proper async test markers.


203-299: LGTM: Thorough endpoint and validation testing.

The test suite comprehensively covers the endpoint behavior including:

  • Minimal and full-context requests
  • Unique request ID generation
  • API connection error handling (503 response)
  • Empty LLM response fallback
  • Input validation and whitespace stripping

110-121: Add # type: ignore comment to APIConnectionError instantiation for type checking consistency.

This fixture instantiates APIConnectionError(request=mocker.Mock()) without the # type: ignore comment that all other similar usages in the codebase include. Add the comment on line 114-115 to align with the pattern used throughout the test suite (e.g., test_info.py, test_models.py, test_rags.py).

⛔ Skipped due to learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in tests
src/app/endpoints/rlsapi_v1.py (3)

8-46: LGTM: Imports and response documentation updated appropriately.

The new imports support LLM integration and the infer_responses mapping now includes the 503 ServiceUnavailableResponse for API unavailability scenarios.


82-121: LGTM: Well-structured stateless retrieval function.

The function properly handles LLM communication with defensive checks for missing attributes. The use of getattr prevents AttributeError when the response structure is incomplete.


123-183: LGTM: Robust endpoint implementation with proper error handling.

The endpoint correctly:

  • Generates unique request IDs for tracing
  • Handles APIConnectionError by returning HTTP 503 with ServiceUnavailableResponse
  • Provides fallback text for empty LLM responses
  • Includes comprehensive logging at all stages

As per coding guidelines, the endpoint properly handles APIConnectionError from Llama Stack.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems legit, thank you

@tisnik
Copy link
Contributor

tisnik commented Dec 15, 2025

/ok-to-test

@tisnik tisnik merged commit 5e66f50 into lightspeed-core:main Dec 16, 2025
19 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants